Gilles Louppe, January 2016.
This notebook introduces the carl.distributions
module. It illustrates how distributions can be defined and composed, and how their parameters can be fit to data.
Disclaimer: This module is meant to be a proof-of-concept that a full Python+Theano based RooFit equivalent could reallistically be achieved. At the moment, the module remains very experimental and should not be used in production.
In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import theano
import theano.tensor as T
All base distributions from carl.distributions
implement the DistributionMixin
interface, providing the following methods:
pdf(X)
for computing the probability density at X
nllf(X)
for computing -log(pdf(X))
rvs(n_samples)
for sampling datacdf(X)
for computing the cumulative density at X
(for 1D)ppf(p)
for computing the p
-th quantile (for 1D)Parameters of a distribution (e.g., the location mu
of a normal distribution) are specified at initialization.
In [2]:
from carl.distributions import Normal
p = Normal(mu=0.0, sigma=1.0)
In [3]:
reals = np.linspace(-5, 5, num=1000)
pdf = p.pdf(X=reals.reshape(-1, 1)) # X is a 2D array of shape n_samples x n_features
print(pdf[:10])
In [4]:
plt.plot(reals, pdf,label="pdf(x)")
plt.legend(loc="best")
plt.show()
In [5]:
reals = np.linspace(-5, 5, num=1000)
plt.plot(reals, p.nll(reals.reshape(-1, 1)), label="-log(pdf(x))")
plt.legend(loc="best")
plt.show()
In [6]:
reals = np.linspace(-5, 5, num=1000)
plt.plot(reals, p.cdf(reals.reshape(-1, 1)), label="cdf(x)")
plt.legend(loc="best")
plt.show()
In [7]:
reals = np.linspace(0, 1, num=1000)
plt.plot(reals, p.ppf(reals.reshape(-1, 1)), label="ppf(x)")
plt.legend(loc="best")
plt.show()
In [8]:
p.rvs(n_samples=10000)
Out[8]:
In [9]:
a = theano.shared(1.0, name="a")
b = T.constant(0.5, name="b")
p = Normal(mu=a * b, sigma=2.0)
Note: Under the hood, if a parameter is specified as a scalar, then it is in fact automatically embedded into a Theano shared variable.
In [10]:
# Parameters are Theano symbolic expressions
print(type(p.mu))
print(type(p.sigma)) # sigma=2.0 was embedded into a shared variable
By construction, p.mu
is a Theano symbolic expression which depends on several inputs, in this case a
and b
. Accordingly, the actual variables (or hyper-parameters) that fully define p
are the shared variables a
(created explicitly) and sigma
(created implicitly from the scalar 2.0
). In particular, mu
is not an hyper-parameter of p
since it is itself defined from the variable a
and the constant b
.
In terms of API, inputs of all expressions specified at initialization form together the hyper-parameters of the distribution, provided they are Theano shared variables. Hyper-parameters are stored in p.parameters_
, while constants are stored in p.constants_
.
In [11]:
p.parameters_ # all input parameters (note that mu is not part of those!)
Out[11]:
In [12]:
p.constants_ # all input constants`
Out[12]:
Additionally, parameter expressions can be defined in terms of free Theano variables that are not (yet) tied to any value. These auxiliary inputs will need to be passed at evaluation. All required extra inputs are stored in p.observeds_
.
In [13]:
a = T.dmatrix(name="a") # free input to be specified at evaluation
b = theano.shared(-1.0, name="b")
c = theano.shared(1.0, name="c")
p = Normal(mu=a*b + c)
In [14]:
p.parameters_
Out[14]:
In [15]:
p.constants_
Out[15]:
In [16]:
p.observeds_
Out[16]:
In [17]:
p.pdf(X=np.array([[0.0], [0.0]]),
a=np.array([[1.0], [2.0]])) # specify the auxiliary input `a` at evaluation
Out[17]:
In [18]:
# Plot pdf(x, a)
import mpl_toolkits.mplot3d.axes3d as axes3d
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
Xs = np.arange(-5, 5, 0.25)
As = np.arange(-5, 5, 0.25)
Xs, As = np.meshgrid(Xs, As)
Ps = p.pdf(X=Xs.reshape(-1, 1),
a=As.reshape(-1, 1))
Ps = Ps.reshape(Xs.shape)
ax.plot_wireframe(Xs, As, Ps, rstride=4, cstride=4, alpha=0.5)
ax.set_xlabel("x")
ax.set_ylabel("a")
ax.set_zlabel("p(x, a)")
plt.show()
In [19]:
from carl.distributions import Mixture
components = [
Normal(mu=-2.0, sigma=0.75), # c0
Normal(mu=0.0, sigma=2.0), # c1
Normal(mu=1.0, sigma=0.5) # c2 (bump)
]
g = theano.shared(0.05, name="g")
p = Mixture(components=components, weights=[0.5 - g / 2., 0.5 - g / 2., g])
In [20]:
p.parameters_ # union of all component parameters + g
Out[20]:
In [21]:
reals = np.linspace(-5, 5, num=1000)
plt.plot(reals, p.pdf(reals.reshape(-1, 1)), label="pdf(x)")
plt.legend()
plt.show()
In [22]:
reals = np.linspace(-5, 5, num=1000)
plt.plot(reals, p.cdf(reals.reshape(-1, 1)), label="cdf(x)")
plt.legend()
plt.show()
In [23]:
p.weights
Out[23]:
In [24]:
p.compute_weights()
Out[24]:
Note that weights are automatically normalized such that they sum to 1.
Finally, distribution objects also implement a fit
method for fitting through maximum likelihood all parameters from p.parameters_
The Theano expression engine is used to symbolically derive the gradient of the objective function with respect to the parameters. This gradient function is then used to guide to underlying optimization algorithm. If needed, bounds and constraints can also be specified.
In [25]:
# Target distribution
p0 = Mixture(components=[Normal(mu=1.0, sigma=1.0), Normal(mu=4.0, sigma=1.0)],
weights=[0.7, 0.3])
# Fit components[0].mu and mixture weights, freeze all others
w = theano.shared(0.5, name="w")
p1 = Mixture(components=[Normal(mu=0.0, sigma=T.constant(1.0)),
Normal(mu=T.constant(4.0), sigma=T.constant(1.0))],
weights=[w, 1.0 - w])
p1.parameters_
Out[25]:
In [26]:
X = p0.rvs(10000)
p1.fit(X, bounds=[{"param": w, "bounds": (0.5, 1.0)}], use_gradient=False)
Out[26]:
In [27]:
p1.components[0].mu.eval()
Out[27]:
In [28]:
p1.compute_weights()
Out[28]:
In [29]:
reals = np.linspace(-5, 5, num=1000)
plt.hist(X.ravel(), bins=100, normed=1, alpha=0.5, label="x~p0")
plt.plot(reals, p1.pdf(reals.reshape(-1, 1)), label="p1(x)")
plt.legend()
plt.show()